Method and apparatus for determining hot user generated contents

ABSTRACT

According to an example, at least one hot account is determined for each category according to quality scores and correlation degrees of history user generated content (UGCs); after a UGC newly posted by the hot account is received, if a quality score of the newly posted UGC is higher than a predefined quality score threshold and a correlation degree between the newly posted UGC and the category that the hot account belongs to is higher than a predefined correlation degree threshold, the newly posted UGC is determined as a hot UGC.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of International Application No.PCT/CN2013/086839, filed on Nov. 11, 2013. This application claims thebenefit and priority of Chinese Patent Application No. 201310007061.6,filed Jan. 9, 2013. The entire disclosures of each of the aboveapplications are incorporated herein by reference.

FIELD

The present disclosure relates to data processing techniques, and moreparticularly, to a method and an apparatus for determining a hot usergenerated content (UGC).

BACKGROUND

At present, users are both browsers and creators of website contents.The contents created by network users are referred to as user generatedcontent (UGC), e.g., microblogs posted by the users.

A website system on which user can post UGC is usually referred to as aUGC website system, e.g., microblog system, social network service (SNS)system, social forum system, knowledge sharing system, etc. In the UGCwebsite system, each user may post contents and there may be a largeamount of UGCs on the UGC website. Thus, the UGC website system usuallyselects high quality UGC (also referred to as hot UGC) from the largeamount of UGCs and recommends the selected high quality UGC to targetusers.

SUMMARY

According to an example of the present disclosure, a method fordetermining a hot data generated content (UGC) is provided. The methodincludes:

analyzing a history UGC posted by an account in a UGC website system,calculating a quality score of the history UGC posted by the account anda correlation degree between the history UGC and a category, determininga hot account for the category according to the quality score andcorrelation degree of the history UGC;

after receiving a UGC newly posted by the hot account, calculating aquality score of the newly posted UGC and a correlation degree betweenthe newly posted UGC and the category that the hot account belongs to;

determining whether the quality score of the newly posted UGC is higherthan a predefined quality score threshold and whether the correlationdegree between the newly posted UGC and the category that the hotaccount belongs to is higher than a predefined correlation degreethreshold of the category; and

determining, if the quality score of the newly posted UGC is higher thanthe predefined quality score threshold and the correlation degreebetween the newly posted UGC and the category that the hot accountbelongs to is higher than the predefined correlation degree threshold,that the newly posted UGC is a hot UGC.

According to another example of the present disclosure, an apparatus fordetermining a hot UGC is provided. The apparatus includes:

one or more processors;

a memory;

wherein one or more program modules are stored in the memory and to beexecuted by the one or more processors, the one or more program modulescomprise:

a hot account determining module, configured to

-   -   analyze a history UGC posted by an account, a quality score of        the history UGC and a correlation degree between the history UGC        and a category, and    -   determine, for the category, one or more accounts as hot        accounts according to the quality score and the correlation        degree of the history UGC; and

a hot UGC determining module, configured to

-   -   calculate, after receiving a UGC newly posted by the hot        account, a quality score of the newly posted UGC and a        correlation degree between the newly posted UGC and the category        that the hot account belongs to;    -   determine whether the quality score of the newly posted UGC is        higher than a predefined quality score threshold of the category        and whether the correlation degree is higher than a predefined        correlation degree threshold of the category; and    -   determine, if the quality score of the newly posted UGC is        higher than the predefined quality score threshold of the        category and the correlation degree is higher than the        predefined correlation degree threshold of the category, the        newly posted UGC as a hot UGC in the category that the hot        account belongs to.

According to still another example of the present disclosure, anon-transitory computer-readable storage medium includes a set ofinstructions for determining a hot UGC is provided, the set ofinstructions to direct at least one processor to perform acts of:

analyzing a history UGC posted by an account in a UGC website system,calculating a quality score of the history UGC posted by the account anda correlation degree between the history UGC and a category, determininga hot account for the category according to the quality score andcorrelation degree of the history UGC;

after receiving a UGC newly posted by the hot account, calculating aquality score of the newly posted UGC and a correlation degree betweenthe newly posted UGC and the category that the hot account belongs to;

determining whether the quality score of the newly posted UGC is higherthan a predefined quality score threshold and whether the correlationdegree between the newly posted UGC and the category that the hotaccount belongs to is higher than a predefined correlation degreethreshold of the category; and

if the quality score of the newly posted UGC is higher than thepredefined quality score threshold and the correlation degree betweenthe newly posted UGC and the category that the hot account belongs to ishigher than the predefined correlation degree threshold, determiningthat the newly posted UGC is a hot UGC.

Other aspects or embodiments of the present disclosure can be understoodby those skilled in the art in light of the description, the claims, andthe drawings of the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

Features of the present disclosure are illustrated by way of example andnot limited in the following figures, in which like numerals indicatelike elements, in which:

FIG. 1 is a schematic diagram illustrating an example embodiment of acomputer system for implementing a method for determining a hot UGC.

FIG. 2 is a flowchart illustrating a method for determining a hot UGCaccording to an example of the present disclosure.

FIG. 3 is a flowchart illustrating a process of determining a hotaccount in block 201 of FIG. 2 according to an example of the presentdisclosure.

FIG. 4 is a schematic diagram illustrating a method for determining ahot UGC according to another example of the present disclosure.

FIG. 5 is a schematic diagram illustrating a method for determining ahot UGC according to still another example of the present disclosure.

FIG. 6 is a schematic diagram illustrating a method for determining ahot UGC applied in a microblog system according to an example of thepresent disclosure.

FIG. 7 is a schematic diagram illustrating an apparatus for determininga hot UGC according to an example of the present disclosure.

FIG. 8 is a schematic diagram illustrating an apparatus for determininga hot UGC according to another example of the present disclosure.

FIG. 9 is a schematic diagram illustrating an apparatus for determininga hot UGC according to still another example of the present disclosure.

DETAILED DESCRIPTION

The preset disclosure will be described in further detail hereinafterwith reference to accompanying drawings and examples to make thetechnical solution and merits therein clearer.

For simplicity and illustrative purposes, the present disclosure isdescribed by referring to examples. In the following description,numerous specific details are set forth in order to provide a thoroughunderstanding of the present disclosure. It will be readily apparenthowever, that the present disclosure may be practiced without limitationto these specific details. In other instances, some methods andstructures have not been described in detail so as not to unnecessarilyobscure the present disclosure. As used herein, the term “includes”means includes but not limited to, the term “including” means includingbut not limited to. The term “based on” means based at least in part on.In addition, the terms “a” and “an” are intended to denote at least oneof a particular element.

In a UGC website system, each user may generate contents. Among thesecontents, there may be erroneous, fake or prejudiced contents.Therefore, the user generated contents should be filtered or selected.Thereafter, hot contents are selected and provided to target users, suchthat the target users are capable of browsing their interested contentsin time.

In an existing technique, the selected hot contents are provided tousers as “hot microblogs”. In this technique, a microblog systemclassifies microblogs into different categories, such as “sports”,“finance and economics”, “shopping”, “news”, etc. In each category, oneor more accounts are configured as hot accounts by a manager of themicroblog system, e.g., according to the number of fans of the account.Microblogs posted by these hot accounts in one category during a periodof time are sorted according to forwarding times and number of comments.In other words, for one microblog, the more forwarding times and thenumber of comments, the higher it ranks.

In the above technique, the hot account is configured according to thenumber of fans following this account. If the number of fans of anaccount exceeds a number, the account is configured as a hot account.However, contents posted by an account having many fans are not alwayshot contents. Similarly, contents posted by an account having few fansare not necessarily low quality contents.

In addition, the above existing technique sorts the UGCs according tothe forwarding times and the number of comments, but not according tothe contents of the UGCs. Thus, the finally selected hot microblog maybe less correlated to target users and the category that it belongs to.For example, a hot account in “sports” category may post a hot microblogrelated to shopping. However, target users of the “sports” category areless interested in shopping.

Moreover, contents which have more forwarding times and comments areusually posted earlier. Newly posted contents generally have lessforwarding times and comments. Therefore, in the above existingtechnique, newly posted contents have little possibility to be selectedas high-quality contents, i.e., hot microblogs.

In contrast to this, an example of the present disclosure provides amethod for determining a hot UGC. In the example of the presentdisclosure, a UGC website system analyzes history UGCs posted by eachaccount to obtain a quality score of each history UGC and a correlationdegree between the history UGC and each category. The UGC website systemselects one or more hot accounts in each category according to qualityscores and correlation degrees of the history UGCs.

After receiving a UGC newly posted by a hot account, the UGC websitesystem calculates a quality score of the newly posted UGC and acorrelation degree between the newly posted UGC and the category thatthe hot account belongs to. The UGC website system determines whetherthe quality score is higher than a predefined quality score thresholdand whether the correlation degree is higher than a predefinedcorrelation degree threshold of the category. If the quality score ishigher than the predefined quality score threshold and the correlationdegree is higher than the predefined correlation degree threshold of thecategory, the newly posted UGC is determined as a hot UGC in thecategory that the hot account belongs to.

FIG. 1 is a schematic diagram illustrating an example embodiment of acomputer system for executing the method for determining a hot UGC. Acomputer system 100 may be a computing device capable of executing amethod and apparatus of present disclosure. The computer system 100 may,for example, be a device such as a server that provides service to userslocally or via a network.

The computer system 100 may vary in terms of capabilities or features.Claimed subject matter is intended to cover a wide range of potentialvariations. For example, the computer system 100 may include or mayexecute a variety of operating systems 141. The computer system 100 mayinclude or may execute a variety of possible applications 142, such as ahot UGC determining application 145.

Further, the computer system 100 may include one or more non-transitoryprocessor-readable storage media 130 and one or more processors 122 incommunication with the non-transitory processor-readable storage media130. For example, the non-transitory processor-readable storage media130 may be a RAM memory, flash memory, ROM memory, EPROM memory, EEPROMmemory, registers, hard disk, a removable disk, a CD-ROM, or any otherform of non-transitory storage medium known in the art. The one or morenon-transitory processor-readable storage media 130 may store sets ofinstructions, or units and/or modules that comprise the sets ofinstructions, for conducting operations described in the presentapplication. The one or more processors may be configured to execute thesets of instructions and perform the operations in example embodimentsof the present application.

FIG. 2 is a schematic diagram illustrating a method for determining ahot UGC according to an example of the present disclosed hot UGCdetermining application 145. FIG. 2 is a simplified diagram according toone embodiment of the present invention. This diagram is merely anexample, which should not unduly limit the scope of the claims. One ofordinary skill in the art would recognize many variations, alternatives,and modifications.

As shown in FIG. 2, the method includes at least the following.

At block 201, a UGC website system analyzes history UGCs posted by eachaccount to obtain a quality score of each history UGC and a correlationdegree between the history UGC and each category. The UGC website systemselects one or more hot accounts in each category according to qualityscores and correlation degrees of the history UGCs.

This block may involve a large amount of calculations. Thus, this blockmay be performed offline.

FIG. 3 is a flowchart illustrating a process of obtaining one or morehot accounts in block 201 according to an example of the presentdisclosure.

As shown in FIG. 3, the process includes the following. In the example,it is possible to consider only original UGCs posted by each account.

At block 211, one or more original UGCs posted by each account during aperiod of time (e.g., last two months) are obtained.

At block 212, for each original UGC, a quality score of the original UGCand a correlation degree between the original UGC and each category arecalculated.

At block 213, for each account, an average quality score of the accountand an average correlation degree between the account and each categoryare calculated according to the quality scores of the original UGCs andthe correlation degrees between the original UGCs and each category,wherein

${{{{{the}\mspace{14mu} {average}\mspace{14mu} {quality}\mspace{14mu} {score}\mspace{14mu} {of}\mspace{14mu} {an}\mspace{14mu} {account}} = \frac{\begin{matrix}{a\mspace{14mu} {sum}\mspace{14mu} {of}\mspace{14mu} {quality}\mspace{14mu} {scores}\mspace{14mu} {of}\mspace{14mu} {the}} \\{{original}\mspace{14mu} {UGCs}\mspace{14mu} {posted}\mspace{14mu} {by}\mspace{14mu} {the}\mspace{14mu} {account}}\end{matrix}}{{the}\mspace{14mu} {number}\mspace{14mu} {of}\mspace{14mu} {the}\mspace{14mu} {original}\mspace{14mu} {UGCs}\mspace{14mu} {posted}\mspace{14mu} {by}\mspace{14mu} {the}\mspace{14mu} {account}}};}{the}\mspace{14mu} {average}\mspace{14mu} {correlation}\mspace{14mu} {degree}\mspace{14mu} {between}\mspace{14mu} {an}\mspace{14mu} {account}\mspace{14mu} {and}\mspace{14mu} a\mspace{14mu} {category}} = {\frac{\begin{matrix}{a\mspace{14mu} {sum}\mspace{14mu} {of}\mspace{14mu} {correlation}\mspace{14mu} {degrees}\mspace{14mu} {between}\mspace{14mu} {the}} \\{{original}\mspace{14mu} {UGCs}\mspace{14mu} {and}\mspace{14mu} {the}\mspace{14mu} {category}}\end{matrix}}{{the}\mspace{14mu} {number}\mspace{14mu} {of}\mspace{14mu} {the}\mspace{14mu} {original}\mspace{14mu} {UGCs}\mspace{14mu} {posted}\mspace{14mu} {by}\mspace{14mu} {the}\mspace{14mu} {account}}.}$

At block 214, for each account, a category that a highest correlationdegree of the account corresponds to is selected as a category that theaccount belongs to.

In block 213, for each account, a correlation degree between the accountand each category is calculated. Thus, one account may correspond to onecorrelation degree in each category. Therefore, in block 214, thecategory that the highest correlation degree of the account correspondsto is selected as the category that the account belongs to.

At block 215, for each account, it is determined whether the averagequality score of the account is higher than a predefined average qualityscore threshold of the category that the account belongs to and whetheran average correlation degree between the account and the category thatthe account belongs to is higher than a predefined average correlationdegree threshold. If the average quality score of the account is higherthan the predefined average quality score threshold of the category thatthe account belongs to and the average correlation degree between theaccount and the category that the account belongs to is higher than thepredefined average correlation degree threshold, the account isdetermined as a hot account in the category that the account belongs to.Otherwise, the account is not a hot account.

As described above, the quality score of each original UGC and thecorrelation degree between the original UGC and each category areimportant parameters for determining a hot account. Based on the abovetwo parameters, i.e., the quality score and the correlation degree,another parameter may be generated and acts as a basis for determining ahot account.

For example, in block 212, after the quality score of each original UGCand the correlation degree between the original UGC and each categoryare calculated, it is possible to multiply the quality score of theoriginal UGC by the correlation degree between the original UGC and eachcategory to obtain a reliability degree of the original UGC in eachcategory. The reliability degree is a derived parameter which may beused as a basis for determining the hot account.

At this time, block 213 further includes: calculating an averagereliability degree of each account with respect to each categoryaccording to the reliability degrees of original UGCs posted by theaccount in each category, wherein

${{the}\mspace{14mu} {average}\mspace{14mu} {reliability}\mspace{14mu} {degree}\mspace{14mu} {of}\mspace{14mu} {an}\mspace{14mu} {account}\mspace{14mu} {in}\mspace{14mu} a\mspace{14mu} {category}} = \frac{\begin{matrix}{a\mspace{14mu} {sum}\mspace{14mu} {of}\mspace{14mu} {reliability}\mspace{14mu} {degrees}\mspace{14mu} {of}\mspace{14mu} {the}\mspace{14mu} {original}\mspace{14mu} {UGCs}} \\{{posted}\mspace{14mu} {by}\mspace{14mu} {the}\mspace{14mu} {account}\mspace{14mu} {in}\mspace{14mu} {the}\mspace{14mu} {category}}\end{matrix}}{{the}\mspace{14mu} {number}\mspace{14mu} {of}\mspace{14mu} {o{riginal}}\mspace{14mu} {UGCs}\mspace{14mu} {posted}\mspace{14mu} {by}\mspace{14mu} {the}\mspace{14mu} {account}}$

In addition, block 215 further includes: for each account, after it isdetermined that the average quality score of the account is higher thanthe predefined average quality score threshold and the averagecorrelation degree between the account and the category that the accountbelongs to is higher than the predefined average correlation degreethreshold, it is further determined whether the average reliabilitydegree of the account in the category is higher than a predefinedaverage reliability degree threshold. If yes, the account is determinedas a hot account. Otherwise, the account is not a hot account.

Through the above block 201, one or more accounts may be determined ashot accounts in one category.

At block 202, after receiving a UGC newly posted by a hot account, theUGC website system calculates a quality score of the newly posted UGCand a correlation degree between the newly posted UGC and the categorythat the hot account belongs to. The UGC website system determineswhether the quality score is higher than a predefined quality scorethreshold and whether the correlation degree is higher than a predefinedcorrelation degree threshold of the category. If the quality score ishigher than the predefined quality score threshold and the correlationdegree is higher than the predefined correlation degree threshold of thecategory, the newly posted UGC is determined as a hot UGC in thecategory that the hot account belongs to.

In one example, the UGC website system may execute block 202 each timeit receives a UGC newly posted by a hot account. Alternatively, the UGCwebsite system may also execute block 202 periodically, i.e., after acertain period of time (e.g., every 10 minutes). At this time, the UGCwebsite system executes block 202 to process each UGC newly postedduring this period of time.

In the above blocks 201 and 202, the quality score of a UGC has to becalculated. In block 201, the quality score of a history UGC iscalculated. In block 202, the quality score of a newly posted UGC iscalculated. The calculation of the quality score in blocks 201 and 202may be performed following a same manner or different manners.Hereinafter, one exemplary calculation manner is provided. Those withordinary skill in the art may have other calculation manners tocalculate the quality score of the history UGC or the newly posted UGC,which is not restricted in the present disclosure.

A total text length, number of words, number of filtered words andnumber of punctuations in a UGC are obtained. The number of filteredwords refers to the number of words which match predefined filteringwords.

The number of effective words of the UGC is determined, wherein

the  number  of  effective  words = total  number  of  words − number  of  filtered  words − number  of  punctuations.

A text basic score of the UGC is determined, wherein

the  text  basic  score = w 5 × number  of  effective  words + w 6 × number  of  filtered  words,  

w5 and w6 are weight parameters which may be determined based ontraining data.

A number of repeated words of the UGC and a word repetition ratio aredetermined; wherein

${{the}\mspace{14mu} {word}\mspace{14mu} {repetition}\mspace{14mu} {ratio}} = {\frac{{number}\mspace{14mu} {of}\mspace{14mu} {repeated}\mspace{14mu} {words}}{{total}\mspace{14mu} {number}\mspace{14mu} {of}\mspace{14mu} {words}}.}$

A text score of the UGC is determined; wherein

${{{the}\mspace{14mu} {text}\mspace{14mu} {score}} = {{text}\mspace{14mu} {basic}\mspace{14mu} {score} \times \frac{{number}\mspace{14mu} {of}\mspace{14mu} {effective}\mspace{14mu} {words}}{{total}\mspace{14mu} {number}\mspace{14mu} {of}\mspace{14mu} {words}} \times f\; 1 \times {\left( {1 - {{word}\mspace{14mu} {repetition}\mspace{14mu} {ratio}}} \right)/w}\; 4}},$

wherein f1 is a predefined function taking the number of punctuationsand the total number of words as input parameters, w4 is a weightparameter.

A posted time of the UGC is obtained and a time score of the UGC iscalculated, wherein

${{{the}\mspace{14mu} {time}\mspace{14mu} {score}} = \frac{{{posted}\mspace{14mu} {time}\mspace{14mu} {of}\mspace{14mu} {the}\mspace{14mu} {UGC}} - {{predefined}\mspace{14mu} {reference}\mspace{14mu} {time}}}{w\; 7}},$

wherein w7 is a weight parameter.

The quality score of the UGC is determined, wherein

the quality score=w1×(w2×text score+w3×time score),

wherein w1, w2 and w3 are weight parameters.

Now, through the above process, the quality score of each UGC (e.g., ahistory UGC or a newly posted UGC) is calculated.

Besides the quality score, in blocks 201 and 202, a correlation degreebetween the UGC and a category is also required to be calculated.Specifically, in block 201, the correlation degree between the historyUGC and each category is calculated. In block 202, the correlationdegree between a newly posted UGC and the category that the hot accountwhich posts the new UGC belongs to is calculated. It should be notedthat, the correlation degree may be calculated in a same manner ordifferent manners in blocks 201 and 202. One exemplary calculationmanner of the correlation degree is described in the following. Thosewith ordinary skill in the art may have other calculation manners todetermine the correlation degree, which is not restricted in the presentdisclosure.

One exemplary formula is as follows:

Correlation degree=W1*F1(weight)+W2*F2(rate)+W3*F3(rank).

W1, W2 and W3 are three weight parameters.

Weight denotes weight of the category.

Rate denotes a value that the weight of the category is divided by atotal weight.

Rank denotes a ranking position of the category in all categories.

F1 denotes a function for normalizing the weight to 0-1.

F2 denotes a function for normalizing the rate to 0-1.

F3 denotes a function for normalizing the rank to 0-1.

Through the above blocks 201 and 202, it is possible to determine one ormore hot accounts according to quality scores of history UGCs andcorrelation degrees between the history UGCs and the categories.Compared with the existing technique in which the hot account isdetermined according to number of fans or other subjective factors(e.g., configured by a network manager manually), the method provided bythe example of the present disclosure determines the hot accounts basedon the contents of the UGC posted by all accounts. The determination ismore objective. In addition, the contents of the hot UGC selected fromthe UGCs posted by these hot accounts have a high correlation degreewith contents that the users are interested in, and also have a highcorrelation degree with the category that it belongs to. Moreover, themethod provided by the example of the present disclosure is capable ofperforming the selection operation after a newly posted UGC is received.Thus, the hot UGC may be provided to users rapidly.

FIG. 4 is a flowchart illustrating a method for determining a hot UGCaccording to another example of the present disclosure.

As shown in FIG. 4, the method includes the following.

Block 401 is the same with block 201.

At block 402, for a newly posted UGC, it is determined that whether theUGC contains a word which is in a predefined blacklist. If yes, the UGCis removed at block 403, i.e., not considered and no further calculationis performed to this UGC. Otherwise, block 404 is performed.

Through blocks 402 and 403, it is possible to remove UGC containingwords which are in the blacklist. The quality of the hot UGC may beincreased. The number of candidate UGCs may be reduced, which reducesworkload of subsequent calculation.

Block 404 is the same with block 202.

It should be noted that, the calculation of the quality score of thenewly posted UGC and the correlation degree between the newly posted UGCand the category in block 202 may be performed each time a newly postedUGC is received or periodically (e.g., every 10 minutes). If thecalculation is performed periodically, a repetition removing operationmay be performed before the quality score and the correlation degree arecalculated.

FIG. 5 is a flowchart illustrating a method for determining a hot UGCaccording to still another example of the present disclosure. As shownin FIG. 5, the method includes the following.

Block 501 is the same with block 201.

At block 502, it is determined whether at least two UGCs newly posted bya hot account are received. If yes, block 503 is performed. Otherwise,block 504 is performed.

At block 503, a text similarity degree between the newly posted UGCs iscalculated. For UGCs having a text similarity degree higher than apredefined threshold, a UGC which is posted later is removed, or, a UGCwhich is posted earlier is reserved.

Thus, the following calculation is only performed for the reserved UGC.The number of candidate UGCs is reduced and the workload of thesubsequent calculation is reduced.

The calculation of the text similarity degree between the newly postedUGCs may be as follows: perform a word segmentation operation to eachnewly posted UGC to obtain notional words (i.e., words having meaningsthemselves), calculate a notional word repetition ratio between each twoUGCs. The notional word repetition ratio is the word similarity degree.For two UGCs having a notional word repetition ratio higher than apredefined threshold, only the UGC which is posted earlier is reservedfor further processing.

Block 504 is the same with block 202.

In examples of the present disclosure, the UGC website system may be amicroblog system, a social network service (SNS) system, a social forumsystem, a knowledge sharing system, etc. Hereinafter, the microblogsystem is taken as an example to describe an implementation of thepresent disclosure. In the following, the microblog is the UGC describedin the above examples.

FIG. 6 is a flowchart illustrating a method for determining a hot UGCapplied in a microblog system according to an example of the presentdisclosure. As shown in FIG. 6, the method includes the following.

At block 601, one or more hot accounts in each category are determined.This block may specifically include the following blocks 611 to 615.

At block 611, original microblogs posted by each account within acertain period (e.g., last two months) are obtained.

For example, microblogs in following table 1 are obtained.

TABLE 1 account index Microblog contents a 1 Dave noted on Tuesday thatprogress has seemingly been halted in the ongoing labor negotiationsbetween the NBA and the National Basketball Players Association. Here'sa look at how some of the players competing at Impact Basketball's“Lockout League” in Las Vegas took the news. There's definitely somerising frustration with the lack of progress. 2 Spears's aunt SandraBridges Covington, with whom she had been very close, died of ovariancancer in January. In February 2007, Spears stayed in a drugrehabilitation facility in Antigua for less than a day. The followingnight, she shaved her head with electric clippers at a hair salon inTarzana, Los Angeles. b 3 The Su-27 is a highly integrated twin-finnedaircraft. The airframe is constructed of titanium and high-strengthaluminum alloys. The engine nacelles are fitted with trouser fairings toprovide a continuous streamlined profile between the nacelles and thetail beams. The fins and horizontal tail consoles are attached to tailbeams. 4 Scorpios love competition in both work and play, which is whythey'll air it out in sports and games. Extreme sports are right upScorpio's alley, as is most anything that will test their mettle.They've got to have an adversary, since it makes the game that much morefun. Scorpio's colors? Powerful red and serious black. When it comes tolove, though, Scorpios soften up a bit and are caring and devoted withtheir lovers, even if they do hold on a bit tight. 5 NBA Players inVegas React To Lockout Negotiations Stalling In NYC

At block 612, a quality score of each original microblog, a correlationdegree between each original microblog and each category and areliability degree of each original microblog in each category arecalculated.

Suppose that a formula for calculating the quality score is as follows:

Quality score=700000*(0.5*text score+0.4*(posted time of themicroblog−1293811200)/w7), wherein w7=3600*87600;

-   -   wherein text score=(total text length+5*(total number of        words−number of filtered words−number of punctuations)−20*number        of filtered words)*(total number of words−number of filtered        words−number of punctuations)/total number of words*f1 (number        of punctuations, total number of words)*(1−number of repeated        words/total number of words)/840.

The function f1 may be obtained through analyzing of training data. Anexample is as follows.

The value of f1 is 1 in default.

If the number of punctuations is 0, f1=0.3 if the total length is largerthan 300, f1=0.6 if the total length is larger than 100, and f1=0.88 ifthe total length is larger than 70.

If the number of punctuations is larger than 40, f1=0.74.

If the number of punctuations is larger than 30, f1=0.82.

If the number of punctuations is larger than 20, f1=0.92.

If a quotient obtained by dividing the number of punctuations by thetotal length is smaller than 0.03, f1=0.73.

If a quotient obtained by dividing the number of punctuations by thetotal length is smaller than 0.05, f1=0.9.

Herein, suppose that a formula for calculating the correlation degreebetween a microblog and a category is as follows.

Correlation degree=0.2*F1 (weight)+0.6*F2 (rate)+0.2*F3 (rank).

F1 is defined as follows:

If weight>3, F1=1;

Otherwise, F1=pow (weight/3, 0.2).

F2 is defined as follows:

If rate>0.5, F2=1;

Otherwise, F2=pow (rate/0.5, 0.4).

F3 is defined as follows:

If rank>10, F3=0;

Otherwise, F3=pow ((11.0−rank)/10.0, 1.5).

Hereinafter, the microblog 1 is taken as an example to describe thecalculation of the quality score, the correlation degree and thereliability degree.

(1) The calculation of the quality score of the microblog 1.

The total text length of microblog 1 is 134, total number of words is35, number of punctuations is 9, number of filtered words is 0, and thenumber of repeated words is 0.

The text score of the microblog1=(134+5*(35−0−9)−20*0)*(35−0−9)/35*1*(1−0/35)/840=0.233469.

The time score of microblog1=(1354621754−1293811200)/3600/87600=0.192829.

The quality score of microblog 1=700000*(0.5*text score+0.4*timescore)=700000*(0.5*0.233469+0.4*0.192829)=135706.

(2) The calculation of the correlation degree between microblog 1 andeach category.

A weight of each word in each category may be obtained through atraining method such as term frequency-inverse document frequency(TF-IDF). Then a word classification table with weight is obtained.According to the word classification table, the weight of each wordsegmented from the microblog in each category may be obtained. Forexample, the weight of each word segmented from microblog 1 in eachcategory is as shown in table 2.

TABLE 2 word category weight nba basketball 1.000000 lockout basketball0.529648 player basketball 0.205528 player football 0.197445 disclosenews 0.120000 message news 0.120000 price shopping 0.100000 acceptquotation 0.100000 disclose military 0.100000 news military 0.100000

According to table 2 and the formula of correlation degree 0.2*F1(weight)+0.6*F2 (rate)+0.2*F3 (rank), a correlation degree betweenmicroblog 1 and each category may be obtained, as shown in table 3.

TABLE 3 category weight rate rank Correlation degree basketball 1.7351760.208239 1 0.990000 news 0.240000 0.028802 2 0.306711 military 0.2000000.024002 3 0.256398 football 0.197445 0.023695 4 0.228975 quotation0.100000 0.012001 5 0.149597 shopping 0.100000 0.012001 6 0.127356

(3) The calculation of the reliability degree of microblog 1 in eachcategory. A following formula may be used: reliability degree=qualityscore*correlation degree of the microblog in the category. A calculatedresult may be as shown in table 4.

TABLE 4 category Correlation degree reliability basketball 0.990000134348 news 0.306711 41622 military 0.256398 34794 football 0.22897531073 quotation 0.149597 20301 shopping 0.127356 17282

Based on the above calculations of the quality score, the correlationdegree and the reliability degree, a result as shown in table 5 may beobtained.

TABLE 5 Quality Related Correlation Reliability account index Microblogcontents score category degree degree a 1 Dave noted on Tuesday 135706basketball 0.990000 134348 that progress has Current 0.306711 41622seemingly been halted in events the ongoing labor Military 0.25639834794 negotiations between the Football 0.228975 31073 NBA and theNational Quotation 0.149597 20301 Basketball Players shopping 0.12735617282 Association. Here's a look at how some of the players competing atImpact Basketball's “Lockout League” in Las Vegas took the news. There'sdefinitely some rising frustration with the lack of progress. 2 Spears'saunt Sandra 164149 basketball 0.930000 152658 Bridges Covington, withauto 0.510763 83841 whom she had been very digital 0.483108 79301 close,died of ovarian family 0.261693 42956 cancer in January. In quotation0.177987 29216 February 2007, Spears telecom 0.155746 25565 stayed in adrug history 0.120424 19767 rehabilitation facility in fun 0.09238815165 Antigua for less than a day. politics 0.060536 9936 The followingnight, she Love 0.048842 8017 shaved her head with Current 0.021259 3489electric clippers at a hair events salon in Tarzana, Los Angeles. b 3The Su-27 is a highly 130740 military 1.000000 130740 integratedtwin-finned politics 0.327131 42769 aircraft. The airframe is Science0.230002 30070 constructed of titanium work 0.184141 24074 andhigh-strength Auto 0.143291 18733 aluminum alloys. The Foreign 0.10914414269 engine nacelles are fitted language with trouser fairings to IT0.077400 10119 provide a continuous travel 0.049616 6486 streamlinedprofile between the nacelles and the tail beams. The fins and horizontaltail consoles are attached to tail beams. 4 Scorpios love competition149081 constellation 0.980000 146099 in both work and play, Love0.430763 64218 which is why they'll air it quotation 0.397950 59326 outin sports and games. Fun 0.229745 34250 Extreme sports are right upbeauty 0.186795 27833 Scorpio's alley, as is most Current 0.127017 18933anything that will test their events mettle. They've got to have anadversary, since it makes the game that much more fun. Scorpio's colors?Powerful red and serious black. When it comes to love, though, Scorpiossoften up a bit and are caring and devoted with their lovers, even ifthey do hold on a bit tight. 5 NBA Players In Vegas 128133 basketball0.995000 12749 React To Lockout fun 0.470763 60222 Negotiations StallingVideo 0.259470 33186 In NYC Football 0.226073 28958 music 0.148228 18963dance 0.124404 15888 quotation 0.094198 12044 clothing 0.076465 9738

At block 613, based on the above data, an average quality score of eachaccount, an average correlation degree between the account and eachcategory and an average reliability degree of the account in eachcategory are obtained, as shown in table 6.

TABLE 6 Reli- Number of Quality Related Correlation ability accountmicroblogs score category degree degree a 2 144359 Basketball 0.960000138031 auto 0.255381 41931 Digital 0.241554 39661 Current events0.153356 19109 Military 0.128199 15974 b 3 135986 Military 0.33333343594 basketball 0.331667 42512 Constellation 0.326667 48713 Fun0.156921 20113 Current events 0.143588 21412 Quotation 0.132650 19781Politics 0.109044 14261 Video 0.086490 11086 science 0.076667 10026

At block 614, for each account, a category that a highest averagecorrelation degree of the user corresponds to is selected as thecategory that the account belongs to. For example, as shown in table 6,account “a” belongs to category “basketball” and account “b” belongs tocategory “military”.

At block 615, a hot account is obtained. Herein, suppose a selectioncriterion of the hot account is that the following three conditions aremet:

1) quality score>70000;

2) correlation degree>0.3; and

3) reliability degree>65000.

According to the above selection criterion, account a is a hot accountin category “basketball” and account b is discarded.

Through the above blocks, a hot account is obtained. After the microblogsystem receives a microblog posted by the hot account, the followingblocks 602 to 606 may be performed.

Suppose microblogs of three hot accounts A, B and C are received, asshown in table 7.

TABLE 7 Hot Microblog account category index Microblog contents Aquotation 1 Love means never having to say you're sorry 2 Your openingshows great promise, and yet flashy purple patches; as when describing asacred grove, or the altar of Diana, or a stream meandering throughfields, or the river Rhine, or a rainbow; but this was not the place forthem. If you can realistically render a cypress tree, would you includeone when commissioned to paint a sailor in the midst of a shipwreck? 3Poetic diction treats the manner in which language is used, and refersnot only to the sound but also to the underlying meaning and itsinteraction with sound and form. Many languages and poetic forms havevery specific poetic dictions, to the point where distinct grammars anddialects are used specifically for poetry. B basketball 4 the lastminute kill shot 5 Jeremy Lin scored 28 points and dished out acareer-high 14 assists as the New York Knicks got back on the winningcolumn after defeating the Dallas Mavericks, winners of six straight,with a 104-97 victory at the MSG 6 Lin has now surpassed 20 points forthe eighth time in nine contests on Sunday, as he shot 11-of-20 from thefloor and 3-of-6 from the three-point line. He scored 16 points in thesecond half, which included a couple of timely three-pointers in thefourth quarter. C food 7 Certain cultures highlight animal and vegetablefoods in their raw state. Salads consisting of raw vegetables or fruitsare common in many cuisines. Sashimi in Japanese cuisine consists of rawsliced fish or other meat, and sushi often incorporates raw fish orseafood. 8 the last minute kill shot 9 sales promotion, 13205593099

At block 602, data pre-processing is performed. Suppose that word“diction” is in the blacklist. Thus, microblog 3 is filtered and othermicroblogs pass the pre-processing.

At block 603, data repetition removing operation is performed. Themicroblogs are segmented to obtain notional words. A notional repetitionratio between each two microblogs is calculated. If the notionalrepetition ratio is higher than a predefined threshold, it is determinedthat the two microblogs are similar and the one which is posted earlieris reserved.

In this example, microblogs 4 and 8 have a repetition ratio higher thanthe predefined threshold. Therefore, the microblog 4 which is postedlater is removed. Subsequent operations are performed to othermicroblogs.

At block 604, a microblog correlation evaluation operation is performed.

According to a correlation degree calculation method similar to block612, a correlation degree of each microblog is calculated. According tothe predefined average correlation degree threshold, it is determinedwhether the microblog passes the evaluation. If the evaluation is notpassed, the microblog is removed. A result is shown in table 8.

TABLE 8 Correlation Hot degree Microblog Correlation Correlation accountcategory threshold index Microblog contents degree evaluation Aquotation 0.85 1 Love means never having 0.83 Not pass to say you'resorry 2 Your opening shows great 0.83 Not pass promise, and yet flashypurple patches; as when describing a sacred grove, or the altar ofDiana, or a stream meandering through fields, or the river Rhine, or arainbow; but this was not the place for them. If you can realisticallyrender a cypress tree, would you include one when commissioned to painta sailor in the midst of a shipwreck? B basketball 0.85 4 the lastminute kill shot 0.816683 Not pass 5 Jeremy Lin scored 28 0.87 passpoints and dished out a career-high 14 assists as the New York Knicksgot back on the winning column after defeating the Dallas Mavericks,winners of six straight, with a 104-97 victory at the MSG 6 Lin has nowsurpassed 20 1 pass points for the eighth time in nine contests onSunday, as he shot 11-of-20 from the floor and 3-of-6 from thethree-point line. He scored 16 points in the second half, which includeda couple of timely three-pointers in the fourth quarter. C food 0.86 7Certain cultures highlight 0.995 pass animal and vegetable foods intheir raw state. Salads consisting of raw vegetables or fruits arecommon in many cuisines. Sashimi in Japanese cuisine consists of rawsliced fish or other meat, and sushi often incorporates raw fish orseafood. 9 sales promotion, 0 Not pass 13205593099

After the correlation evaluation operation, a following table 9 isobtained.

TABLE 9 Hot Microblog account category index Microblog contents Bbasketball 5 Jeremy Lin scored 28 points and dished out a career-high 14assists as the New York Knicks got back on the winning column afterdefeating the Dallas Mavericks, winners of six straight, with a 104-97victory at the MSG 6 Lin has now surpassed 20 points for the eighth timein nine contests on Sunday, as he shot 11-of-20 from the floor and3-of-6 from the three-point line. He scored 16 points in the secondhalf, which included a couple of timely three-pointers in the fourthquarter. C Food 7 Certain cultures highlight animal and vegetable foodsin their raw state. Salads consisting of raw vegetables or fruits arecommon in many cuisines. Sashimi in Japanese cuisine consists of rawsliced fish or other meat, and sushi often incorporates raw fish orseafood.

At block 605, a quality evaluation operation is performed to eachmicroblog.

According to a quality score calculation method similar to that of block612, the quality score of each microblog may be obtained. According to aquality score threshold corresponding to each category, it is determinedwhether a microblog passes the quality evaluation. If the qualityevaluation is not passed, the microblog is removed. A result may be asshown in table 10.

TABLE 10 Quality Quality Hot score Microblog Quality score accountcategory threshold index Microblog contents score evaluation Bbasketball 130000 5 Jeremy Lin scored 28 points 104586 Not pass anddished out a career-high 14 assists as the New York Knicks got back onthe winning column after defeating the Dallas Mavericks, winners of sixstraight, with a 104-97 victory at the MSG 6 Lin has now surpassed 20244886 pass points for the eighth time in nine contests on Sunday, as heshot 11-of-20 from the floor and 3-of-6 from the three-point line. Hescored 16 points in the second half, which included a couple of timelythree-pointers in the fourth quarter. C Food 150000 7 Certain cultureshighlight 123785 Not pass animal and vegetable foods in their raw state.Salads consisting of raw vegetables or fruits are common in manycuisines. Sashimi in Japanese cuisine consists of raw sliced fish orother meat, and sushi often incorporates raw fish or seafood.

After the quality evaluation, microblog 6 is selected as a hot microblogin the category “basketball”.

In view of the above, according to the method provided by the examplesof the present disclosure, it is possible to find hot microblog contentsrapidly and accurately.

In accordance with the above method examples, an example of the presentdisclosure further provides an apparatus for determining a hot UGC. Asshown in FIG. 7, the apparatus 700 includes: a processor 710 and amemory 720; wherein one or more program modules are stored in the memory720 and to be executed by the processor 710, the one or more programmodules comprise: a hot account determining module 701 and a hot UGCdetermining module 702.

The hot account determining module 701 is configured to

-   -   for each history UGC posted by each account, calculate a quality        score of the history UGC and a correlation degree between the        history UGC and each category, and    -   determine one or more accounts in each category as hot accounts        according to quality scores and correlation degrees of the        history UGCs.

The hot UGC determining module 702 is configured to

-   -   calculate, after receiving a UGC newly posted by a hot account,        a quality score of the newly posted UGC and a correlation degree        between the newly posted UGC and the category that the hot        account belongs to;    -   determine whether the quality score is higher than a predefined        quality score threshold and whether the correlation degree is        higher than a predefined correlation degree threshold of the        category; and    -   determine, if the quality score is higher than the predefined        quality score threshold and the correlation degree is higher        than the predefined correlation degree threshold of the        category, the newly posted UGC as a hot UGC in the category that        the hot account belongs to.

FIG. 8 is a schematic diagram illustrating an apparatus for determininga hot UGC according to another example of the present disclosure. Asshow in FIG. 8, the apparatus 800 includes: a processor 810 and a memory820; wherein one or more program modules are stored in the memory 820and to be executed by the processor 810, the one or more program modulescomprise: a hot account determining module 801, a pre-processing module802 and a hot UGC determining module 803.

The hot account determining module 801 is configured to

-   -   for each history UGC posted by each account, calculate a quality        score of the history UGC and a correlation degree between the        history UGC and each category, and    -   determine one or more accounts in each category as hot accounts        according to quality scores and correlation degrees of the        history UGCs.

The pre-processing module 802 is configured to

-   -   determine, after receiving a UGC newly posted by a hot account,        whether the newly posted UGC contains a word in a blacklist;    -   discard the newly posted UGC if the newly posted UGC contains a        word in a blacklist; and    -   provide the newly posted UGC to the hot UGC determining module        803 if otherwise.

The hot UGC determining module 803 is configured to

-   -   calculate, after receiving a newly posted UGC from the        pre-processing module 802, a quality score of the newly posted        UGC and a correlation degree between the newly posted UGC and        the category that the hot account belongs to;    -   determine whether the quality score is higher than a predefined        quality score threshold and whether the correlation degree is        higher than a predefined correlation degree threshold of the        category; and    -   determine, if the quality score is higher than the predefined        quality score threshold and the correlation degree is higher        than the predefined correlation degree threshold of the        category, the newly posted UGC as a hot UGC in the category that        the hot account belongs to.

FIG. 9 is a schematic diagram illustrating an apparatus for determininga hot UGC according to another example of the present disclosure. Asshow in FIG. 9, the apparatus 900 includes: a processor 910 and a memory920; wherein one or more program modules are stored in the memory 920and to be executed by the processor 910, the one or more program modulescomprise: a hot account determining module 901, a repetition removingmodule 902 and a hot UGC determining module 903.

The hot account determining module 901 is configured to

-   -   for each history UGC posted by each account, calculate a quality        score of the history UGC and a correlation degree between the        history UGC and each category, and    -   determine one or more accounts in each category as hot accounts        according to quality scores and correlation degrees of the        history UGCs.

The repetition removing module 902 is configured to

-   -   determine whether at least two UGCs newly posted by a hot        account are received;    -   calculate, if at least two UGCs newly posted by the hot account        are received, a text similarity ratio of each two newly posted        UGCs;    -   if the two newly posted UGCs have a text similarity degree        higher than a predefined threshold, discard a UGC which is        posted later and Provide a UGC which is posted earlier to the        hot UGC determining module 903;    -   if the two newly posted UGCs have a text similarity degree not        higher than the predefined threshold, provide the two newly        posted UGCs to the hot UGC determining module 902.

The hot UGC determining module 903 is configured to

-   -   calculate, after receiving a newly posted UGC from the        repetition removing module 902, a quality score of the newly        posted UGC and a correlation degree between the newly posted UGC        and the category that the hot account belongs to;    -   determine whether the quality score is higher than a predefined        quality score threshold and whether the correlation degree is        higher than a predefined correlation degree threshold of the        category; and    -   determine, if the quality score is higher than the predefined        quality score threshold and the correlation degree is higher        than the predefined correlation degree threshold of the        category, the newly posted UGC as a hot UGC in the category that        the hot account belongs to.

The processor 910 may include one or more processors for executing thesets of instructions stored in the memory 920. The processor 920 is ahardware device, such as a central processing unit (CPU) or a microcontrolling unit (MCU). The memory 920 is a non-transitoryprocessor-readable storage media, such as a RAM memory, flash memory,ROM memory, EPROM memory, EEPROM memory, registers, hard disk, aremovable disk, a CD-ROM, or any other form of non-transitory storagemedium known in the art.

What has been described and illustrated herein is a preferred example ofthe disclosure along with some of its variations. The terms,descriptions and figures used herein are set forth by way ofillustration only and are not meant as limitations. Many variations arepossible within the spirit and scope of the disclosure, which isintended to be defined by the following claims—and their equivalents—inwhich all terms are meant in their broadest reasonable sense unlessotherwise indicated.

1. A method for determining a hot User Generated Content (UGC),comprising: analyzing a history UGC posted by an account in a UGCwebsite system, and calculating a quality score of the history UGCposted by the account and a correlation degree between the history UGCand a category; determining a hot account for the category according tothe quality score and correlation degree of the history UGC; afterreceiving a UGC newly posted by the hot account, calculating a qualityscore of the newly posted UGC and a correlation degree between the newlyposted UGC and the category that the hot account belongs to; determiningwhether the quality score of the newly posted UGC is higher than apredefined quality score threshold, and whether the correlation degreebetween the newly posted UGC and the category that the hot accountbelongs to is higher than a predefined correlation degree threshold ofthe category; determining, if the quality score of the newly posted UGCis higher than the predefined quality score threshold and thecorrelation degree between the newly posted UGC and the category thatthe hot account belongs to is higher than the predefined correlationdegree threshold, that the newly posted UGC is a hot UGC.
 2. The methodof claim 1, further comprising: before calculating the quality score ofthe newly posted UGC and the correlation degree between the newly postedUGC and the category that the hot account belongs to, determiningwhether the newly posted UGC contains a word in a blacklist; if thenewly posted UGC does not contain the word in the blacklist, performingthe process of calculating the quality score of the newly posted UGC andthe correlation degree between the newly posted UGC and the categorythat the hot account belongs to.
 3. The method of claim 1, furthercomprising: before calculating the quality score of the newly posted UGCand the correlation degree between the newly posted UGC and the categorythat the hot account belongs to, determining whether at least two newlyposted UGCs are received; if at least two newly posted UGCs arereceived, calculating a text similarity ratio between each two newlyposted UGCs, if the text similarity ratio between two newly posted UGCsis not higher than a predefined threshold, performing, for each of thetwo newly posted UGCs, the process of calculating the quality score ofthe newly posted UGC and the correlation degree between the newly postedUGC and the category that the hot account belongs to.
 4. The method ofclaim 1, wherein the process of calculating the quality score of thenewly posted UGC and the correlation degree between the newly posted UGCand the category that the hot account belongs to is performed after thenewly posted UGC is received, or is performed periodically for eachnewly posted UGC received during a period of time.
 5. The method ofclaim 1, wherein the analyzing the history UGC posted by the account inthe UGC website system, and calculating the quality score of the historyUGC posted by the account and the correlation degree between the historyUGC and the category and determining the hot account for the categoryaccording to the quality score and correlation degree of the history UGCcomprises: obtaining one or more history UGCs posted by the accountduring a period of time; for each history UGC, calculating the qualityscore of the history UGC and the correlation degree between the historyUGC and each of a plurality of categories; calculating an averagequality score of the account and an average correlation degree betweenthe account and each category according to the quality score andcorrelation degree of each history UGC; determining a category that ahighest correlation degree of the account corresponds to as the categorythat the account belongs to; determining, for the account, whether theaverage quality score of the account is higher than a predefined averagequality score threshold and whether the average correlation degreebetween the account and the category that the account belongs to ishigher than a predefined average correlation degree threshold; if theaverage quality score of the account is higher than the predefinedaverage quality score threshold and the average correlation degreebetween the account and the category that the account belongs to ishigher than the predefined average correlation degree threshold,determining that the account is a hot account.
 6. The method of claim 5,further comprising: before calculating the quality score of the historyUGC and the correlation degree between the history UGC and eachcategory, multiplying the quality score of the history UGC with thecorrelation degree between the history UGC and each category to obtain areliability degree of the history UGC in each category; and calculatingan average reliability degree of the account according to thereliability degree of the account in each category; after determiningthat the average quality score of the account is higher than thepredefined average quality score threshold and the average correlationdegree between the account and the category that the account belongs tois higher than the predefined average correlation degree threshold,determining whether the average reliability degree of the account ishigher than a predefined average reliability degree threshold, if theaverage reliability degree of the account is higher than the predefinedaverage reliability degree threshold, determining that the account is ahot account.
 7. The method of claim 1, wherein the calculating thequality score of the history UGC comprises: obtaining a total textlength, a total number of words, a number of filtered words and a numberof punctuations of the history UGC; determining a number of effectivewords of the UGC, whereinthe  number  of  effective  words = total  number  of  words − the  number  of  filtered  words − the  number  of  punctuations;determining a text basic score of the history UGC, whereinthe  text  basic  score = w 5 × number  of  effective  words + w 6 × number  of  filtered  words,w5 and w6 represent weight parameters; calculating a number repeatedwords and a word repetition ratio of the history UGC, wherein${{{the}\mspace{14mu} {word}\mspace{14mu} {repetition}\mspace{14mu} {ratio}} = \frac{{the}\mspace{14mu} {number}\mspace{14mu} {of}\mspace{14mu} {repeated}\mspace{14mu} {words}}{{the}\mspace{14mu} {total}\mspace{14mu} {number}\mspace{14mu} {of}\mspace{14mu} {words}}};$determining a text score of the history UGC, wherein${{{the}\mspace{14mu} {text}\mspace{14mu} {score}} = {{the}\mspace{14mu} {text}\mspace{14mu} {basic}\mspace{14mu} {score} \times \frac{{the}\mspace{14mu} {number}\mspace{14mu} {of}\mspace{14mu} {effective}\mspace{14mu} {words}}{{the}\mspace{14mu} {total}\mspace{14mu} {number}\mspace{14mu} {of}\mspace{14mu} {words}} \times f\; 1 \times {\left( {1 - {{the}\mspace{14mu} {word}\mspace{14mu} {repetition}\mspace{14mu} {ratio}}} \right)/w}\; 4}},$f1 represent a predefined function taking the number of punctuations andthe total number of words as input parameters, w4 represent a weightparameter; determining a posted time and a time score of the historyUGC, wherein${{{the}\mspace{14mu} {time}\mspace{14mu} {score}} = \frac{{{the}\mspace{14mu} {posted}\mspace{14mu} {time}\mspace{14mu} {of}\mspace{14mu} {the}\mspace{14mu} {UGC}} - {a\mspace{14mu} {predefined}\mspace{14mu} {reference}\mspace{14mu} {time}}}{w\; 7}},$w7 represent a weight parameter; determining the quality score of theUGC, wherein the quality score=w1×(w2×text score+w3×time score), w1, w2and w3 represent weight parameters.
 8. The method of claim 1, whereinthe calculating the correlation degree between the history UGC and thecategory comprises: calculating the correlation degree according to afollowing formula:the correlation degree=W1*F1(weight)+W2*F2(rate)+W3*F3(rank); W1, W2 andW3 represent three weight parameters; weight denotes a weight of thecategory; rate denotes a value that the weight of the category isdivided by a total weight; rank denotes a ranking position of thecategory in all categories; F1 denotes a function for normalizing theweight to 0-1; F2 denotes a function for normalizing the rate to 0-1; F3denotes a function for normalizing the rank to 0-1.
 9. An apparatus fordetermining a hot user generated content (UGC), comprising: one or moreprocessors; a memory; wherein one or more program modules are stored inthe memory and to be executed by the one or more processors, the one ormore program modules comprise: a hot account determining module,configured to analyzing a history UGC posted by an account, a qualityscore of the history UGC and a correlation degree between the historyUGC and a category, and determine, for the category, one or moreaccounts as hot accounts according to the quality score and thecorrelation degree of the history UGC; and a hot UGC determining module,configured to calculate, after receiving a UGC newly posted by the hotaccount, a quality score of the newly posted UGC and a correlationdegree between the newly posted UGC and the category that the hotaccount belongs to; determine whether the quality score of the newlyposted UGC is higher than a predefined quality score threshold of thecategory and whether the correlation degree is higher than a predefinedcorrelation degree threshold of the category; and determine, if thequality score of the newly posted UGC is higher than the predefinedquality score threshold of the category and the correlation degree ishigher than the predefined correlation degree threshold of the category,the newly posted UGC as a hot UGC in the category that the hot accountbelongs to.
 10. The apparatus of claim 9, further comprising: apre-processing module, configured to determine, after receiving the UGCnewly posted by the hot account, whether the newly posted UGC contains aword in a blacklist; discard the newly posted UGC if the newly postedUGC contains a word in a blacklist; and provide the newly posted UGC tothe hot UGC determining module if otherwise.
 11. The apparatus of claim9, further comprising: a repetition removing module, configured todetermine whether at least two UGCs newly posted by the hot account arereceived; calculate, if at least two UGCs newly posted by the hotaccount are received, a text similarity ratio of each two newly postedUGCs; if the two newly posted UGCs have a text similarity degree higherthan a predefined threshold, provide the one which is posted earlier tothe hot UGC determining module; if the two newly posted UGCs have a textsimilarity degree not higher than the predefined threshold, provide thetwo newly posted UGCs to the hot UGC determining module.
 12. Theapparatus of claim 9, wherein the hot account determining module isfurther configured to: obtain one or more history UGCs posted by theaccount during a period of time; calculate, for each history UGC, thequality score of the history UGC and the correlation degree between thehistory UGC and each of a plurality of categories; calculate, for theaccount, an average quality score of the account and an averagecorrelation degree between the account and each category according tothe quality scores and correlation degrees of the one or more historyUGCs; determine, for the account, a category that a highest correlationdegree of the account corresponds to as the category that the accountbelongs to; determine, for the account, whether the average qualityscore of the account is higher than a predefined average quality scorethreshold and whether the average correlation degree between the accountand the category that the account belongs to is higher than a predefinedaverage correlation degree threshold; and determine that the account isa hot account if the average quality score of the account is higher thanthe predefined average quality score threshold and the averagecorrelation degree between the account and the category that the accountbelongs to is higher than the predefined average correlation degreethreshold.
 13. The apparatus of claim 9, wherein the hot accountdetermining module is further configured to: before calculating thequality score of the history UGC and the correlation degree between thehistory UGC and the category, multiply the quality score of the historyUGC with the correlation degree between the history UGC and the categoryto obtain a reliability degree of the history UGC in the category; andcalculate an average reliability degree of the account according to thereliability degree of the account in the category; after determiningthat the average quality score of the account is higher than thepredefined average quality score threshold and the average correlationdegree between the account and the category that the account belongs tois higher than the predefined average correlation degree threshold,determine whether the average reliability degree of the account ishigher than a predefined average reliability degree threshold; anddetermine that the account is a hot account if the average reliabilitydegree of the account is higher than the predefined average reliabilitydegree threshold.
 14. A non-transitory computer-readable storage mediumcomprising a set of instructions for determining a hot user generatedcontent (UGC), the set of instructions to direct at least one processorto perform acts of: analyzing a history UGC posted by an account in aUGC website system, calculating a quality score of the history UGCposted by the account and a correlation degree between the history UGCand a category, determining a hot account for the category according tothe quality score and correlation degree of the history UGC; afterreceiving a UGC newly posted by the hot account, calculating a qualityscore of the newly posted UGC and a correlation degree between the newlyposted UGC and the category that the hot account belongs to; determiningwhether the quality score of the newly posted UGC is higher than apredefined quality score threshold and whether the correlation degreebetween the newly posted UGC and the category that the hot accountbelongs to is higher than a predefined correlation degree threshold ofthe category; determining, if the quality score of the newly posted UGCis higher than the predefined quality score threshold and thecorrelation degree between the newly posted UGC and the category thatthe hot account belongs to is higher than the predefined correlationdegree threshold, that the newly posted UGC is a hot UGC.
 15. Thenon-transitory computer-readable storage medium of claim 14, furthercomprising: before calculating the quality score of the newly posted UGCand the correlation degree between the newly posted UGC and the categorythat the hot account belongs to, determining whether the newly postedUGC contains a word in a blacklist; if the newly posted UGC does notcontain the word in the blacklist, performing the process of calculatingthe quality score of the newly posted UGC and the correlation degreebetween the newly posted UGC and the category that the hot accountbelongs to.
 16. The non-transitory computer-readable storage medium ofclaim 14, further comprising: before calculating the quality score ofthe newly posted UGC and the correlation degree between the newly postedUGC and the category that the hot account belongs to, determiningwhether at least two newly posted UGCs are received; if at least twonewly posted UGCs are received, calculating a text similarity ratiobetween each two newly posted UGCs, if the text similarity ratio betweentwo newly posted UGCs is not higher than a predefined threshold,performing, for each of the two newly posted UGCs, the process ofcalculating the quality score of the newly posted UGC and thecorrelation degree between the newly posted UGC and the category thatthe hot account belongs to.
 17. The non-transitory computer-readablestorage medium of claim 14, wherein the process of calculating thequality score of the newly posted UGC and the correlation degree betweenthe newly posted UGC and the category that the hot account belongs to isperformed after the newly posted UGC is received, or is performedperiodically for each newly posted UGC received during a period of time.18. The non-transitory computer-readable storage medium of claim 14,wherein the analyzing the history UGC posted by the account, andcalculating the quality score of the history UGC posted by the accountand the correlation degree between the history UGC and the category anddetermining the hot account for the category according to the qualityscore and correlation degree of the history UGC comprises: obtaining oneor more history UGCs posted by the account during a period of time; foreach history UGC, calculating the quality score of the history UGC andthe correlation degree between the history UGC and each of a pluralityof categories; calculating an average quality score of the account andan average correlation degree between the account and each categoryaccording to the quality score and correlation degree of each historyUGC; determining a category that a highest correlation degree of theaccount corresponds to as the category that the account belongs to;determining, for the account, whether the average quality score of theaccount is higher than a predefined average quality score threshold andwhether the average correlation degree between the account and thecategory that the account belongs to is higher than a predefined averagecorrelation degree threshold, if the average quality score of theaccount is higher than the predefined average quality score thresholdand the average correlation degree between the account and the categorythat the account belongs to is higher than the predefined averagecorrelation degree threshold, determining that the account is a hotaccount.